{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/colab/healthcare/pipeline_parser/Parser.ipynb)\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#   **📜 PipelineTracer and PipelineOutputParser**"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Starting the session"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Spark NLP Version : 5.3.2\n",
      "Spark NLP_JSL Version : 5.4.0\n",
      "\n",
      "\n",
      "spark session: <pyspark.sql.session.SparkSession object at 0x00000275E1469790>\n"
     ]
    }
   ],
   "source": [
    "from johnsnowlabs import nlp\n",
    "nlp.install()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:24:52.863010Z",
     "start_time": "2024-07-13T04:24:00.920526300Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## PipelineTracer\n",
    "\n",
    "PipelineTracer is a class that allows to trace the stages of a pipeline and get information about them.\n",
    "The `PipelineTracer` class provides functionality for tracing and retrieving information about the various stages of a pipeline.\n",
    "It can be used to obtain detailed insights into the entities, assertions, and relationships utilized within the pipeline.\n",
    "Compatibility with both `PipelineModel` and `PretrainedPipeline`.\n",
    "It can be used with a PipelineModel or a PretrainedPipeline.\n",
    "Additionally, it can be used to create a parser dictionary that can be used to create a PipelineOutputParser.\n",
    "\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## **🔎 Functions**\n",
    "\n",
    "**Functions**:\n",
    "\n",
    "- `printPipelineSchema`: Prints the schema of the pipeline.\n",
    "- `createParserDictionary`: Returns a parser dictionary that can be used to create a PipelineOutputParser\n",
    "- `getPossibleEntities`: Returns a list of possible entities that the pipeline can include.\n",
    "- `getPossibleAssertions`: Returns a list of possible assertions that the pipeline can include\n",
    "- `getPossibleRelations`: Returns a list of possible relations that the pipeline can include.\n",
    "- `getPipelineStages`: Returns a list of PipelineStage objects that represent the stages of the pipeline.\n",
    "- `getParserDictDirectly`: Returns a parser dictionary that can be used to create a PipelineOutputParser. This method is used to get the parser dictionary directly without creating a PipelineTracer objec.\n",
    "- `listAvailableModels`: Returns a list of available models for a given language and source\n",
    "- `showAvailableModels`: Prints a list of available models for a given language and source."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Oncology Pipeline"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning::Spark Session already created, some configs may not take.\n",
      "Warning::Spark Session already created, some configs may not take.\n",
      "explain_clinical_doc_oncology download started this may take some time.\n",
      "Approx size to download 1.8 GB\n",
      "[OK!]\n"
     ]
    }
   ],
   "source": [
    "pipe = nlp.load(\"en.explain_doc.clinical_oncology.pipeline\")"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:55.486962300Z",
     "start_time": "2024-07-13T04:52:50.614860Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T08:29:09.892446600Z",
     "start_time": "2024-07-13T08:29:09.881077300Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning::Spark Session already created, some configs may not take.\n"
     ]
    },
    {
     "data": {
      "text/plain": "['Past', 'Family', 'Absent', 'Hypothetical', 'Possible', 'Present']"
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "pipe.getPossibleAssertions()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:51:44.875302800Z",
     "start_time": "2024-07-13T04:51:44.206488400Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [
    {
     "data": {
      "text/plain": "['Cycle_Number',\n 'Direction',\n 'Histological_Type',\n 'Biomarker_Result',\n 'Site_Other_Body_Part',\n 'Hormonal_Therapy',\n 'Death_Entity',\n 'Targeted_Therapy',\n 'Route',\n 'Tumor_Finding',\n 'Duration',\n 'Pathology_Result',\n 'Chemotherapy',\n 'Date',\n 'Radiotherapy',\n 'Radiation_Dose',\n 'Oncogene',\n 'Cancer_Surgery',\n 'Tumor_Size',\n 'Staging',\n 'Pathology_Test',\n 'Cancer_Dx',\n 'Age',\n 'Site_Lung',\n 'Site_Breast',\n 'Site_Liver',\n 'Site_Lymph_Node',\n 'Response_To_Treatment',\n 'Site_Brain',\n 'Immunotherapy',\n 'Race_Ethnicity',\n 'Metastasis',\n 'Smoking_Status',\n 'Imaging_Test',\n 'Relative_Date',\n 'Line_Of_Therapy',\n 'Unspecific_Therapy',\n 'Site_Bone',\n 'Gender',\n 'Cycle_Count',\n 'Cancer_Score',\n 'Adenopathy',\n 'Grade',\n 'Biomarker',\n 'Invasion',\n 'Frequency',\n 'Performance_Status',\n 'Dosage',\n 'Cycle_Day',\n 'Anatomical_Site',\n 'Size_Trend',\n 'Posology_Information',\n 'Cancer_Therapy',\n 'Lymph_Node',\n 'Tumor_Description',\n 'Lymph_Node_Modifier',\n 'Alcohol',\n 'BMI',\n 'Communicable_Disease',\n 'Obesity',\n 'Oncological',\n 'Diabetes',\n 'Weight',\n 'Overweight']"
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pipe.getPossibleEntities()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:51:44.981212Z",
     "start_time": "2024-07-13T04:51:44.878303100Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [
    {
     "data": {
      "text/plain": "['is_size_of', 'is_date_of', 'is_location_of', 'is_finding_of']"
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pipe.getPossibleRelations()"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:51:44.991729800Z",
     "start_time": "2024-07-13T04:51:44.911462400Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:16.034854800Z",
     "start_time": "2024-07-13T04:52:16.015586400Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### with custom column_maps"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [
    {
     "data": {
      "text/plain": "{'document_identifier': 'clinical_deidentification',\n 'document_text': 'document',\n 'entities': ['merged_chunk', 'merged_chunk_for_assertion'],\n 'assertions': ['assertion'],\n 'resolutions': [],\n 'relations': ['all_relations'],\n 'summaries': [],\n 'deidentifications': [],\n 'classifications': []}"
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "column_maps = pipe.createParserDictionary()\n",
    "column_maps.update({\"document_identifier\": \"clinical_deidentification\"})\n",
    "column_maps"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:04.613445800Z",
     "start_time": "2024-07-13T04:52:04.604169300Z"
    }
   }
  },
  {
   "cell_type": "code",
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [],
   "source": [
    "res = pipe.predict(\"The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response\",\n",
    "            parser_output=True,\n",
    "            parser_config=column_maps)"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:11.492687400Z",
     "start_time": "2024-07-13T04:52:05.814902700Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [
    {
     "data": {
      "text/plain": "{'result': [{'document_identifier': 'clinical_deidentification',\n   'document_text': ['The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response'],\n   'entities': [{'chunk_id': '1b71b12a',\n     'chunk': 'computed tomography',\n     'begin': 24,\n     'end': 42,\n     'ner_label': 'Imaging_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9575'},\n    {'chunk_id': 'ce9ac1a9',\n     'chunk': 'CT',\n     'begin': 45,\n     'end': 46,\n     'ner_label': 'Imaging_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9565'},\n    {'chunk_id': '3576c965',\n     'chunk': 'abdomen',\n     'begin': 61,\n     'end': 67,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9446'},\n    {'chunk_id': 'cff2288c',\n     'chunk': 'pelvis',\n     'begin': 73,\n     'end': 78,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.6514'},\n    {'chunk_id': '98848a68',\n     'chunk': 'ovarian',\n     'begin': 104,\n     'end': 110,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.7915'},\n    {'chunk_id': 'd3e628e9',\n     'chunk': 'mass',\n     'begin': 112,\n     'end': 115,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9557'},\n    {'chunk_id': '3d8b6be0',\n     'chunk': 'Pap smear',\n     'begin': 120,\n     'end': 128,\n     'ner_label': 'Pathology_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.96725'},\n    {'chunk_id': '4d03018b',\n     'chunk': 'one month later',\n     'begin': 140,\n     'end': 154,\n     'ner_label': 'Relative_Date',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.8786667'},\n    {'chunk_id': '8de23a92',\n     'chunk': 'atypical glandular cells',\n     'begin': 173,\n     'end': 196,\n     'ner_label': 'Pathology_Result',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.7270667'},\n    {'chunk_id': '70affced',\n     'chunk': 'adenocarcinoma',\n     'begin': 213,\n     'end': 226,\n     'ner_label': 'Cancer_Dx',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9992'},\n    {'chunk_id': '71dddb8a',\n     'chunk': 'pathologic specimen',\n     'begin': 233,\n     'end': 251,\n     'ner_label': 'Pathology_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.76105'},\n    {'chunk_id': '63e46bca',\n     'chunk': 'extension',\n     'begin': 260,\n     'end': 268,\n     'ner_label': 'Invasion',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9241'},\n    {'chunk_id': 'ac5748d2',\n     'chunk': 'tumor',\n     'begin': 277,\n     'end': 281,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9077'},\n    {'chunk_id': '74e8e40b',\n     'chunk': 'fallopian tubes',\n     'begin': 298,\n     'end': 312,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9492'},\n    {'chunk_id': '76146911',\n     'chunk': 'appendix',\n     'begin': 315,\n     'end': 322,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9971'},\n    {'chunk_id': 'dc74e652',\n     'chunk': 'omentum',\n     'begin': 325,\n     'end': 331,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9937'},\n    {'chunk_id': '5d80c8a0',\n     'chunk': 'enlarged',\n     'begin': 349,\n     'end': 356,\n     'ner_label': 'Lymph_Node_Modifier',\n     'ner_source': 'ner_oncology_tnm_chunk',\n     'ner_confidence': '0.9882'},\n    {'chunk_id': '2bd5973e',\n     'chunk': 'lymph nodes',\n     'begin': 358,\n     'end': 368,\n     'ner_label': 'Site_Lymph_Node',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.80105'},\n    {'chunk_id': '4d1b51ea',\n     'chunk': 'tumor',\n     'begin': 409,\n     'end': 413,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9851'},\n    {'chunk_id': '9b3f72d4',\n     'chunk': 'stage IIIC',\n     'begin': 419,\n     'end': 428,\n     'ner_label': 'Staging',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.99035'},\n    {'chunk_id': '863911ea',\n     'chunk': 'papillary serous ovarian adenocarcinoma',\n     'begin': 430,\n     'end': 468,\n     'ner_label': 'Oncological',\n     'ner_source': 'ner_jsl_chunk',\n     'ner_confidence': '0.60825'},\n    {'chunk_id': 'd47b22b6',\n     'chunk': 'Two months later',\n     'begin': 471,\n     'end': 486,\n     'ner_label': 'Relative_Date',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9206333'},\n    {'chunk_id': '4d3991f5',\n     'chunk': 'lung metastases',\n     'begin': 520,\n     'end': 534,\n     'ner_label': 'Oncological',\n     'ner_source': 'ner_jsl_chunk',\n     'ner_confidence': '0.96220005'},\n    {'chunk_id': 'c2e02074',\n     'chunk': 'Neoadjuvant chemotherapy',\n     'begin': 536,\n     'end': 559,\n     'ner_label': 'Chemotherapy',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9786'},\n    {'chunk_id': 'd5d30ff5',\n     'chunk': 'Cyclophosphamide',\n     'begin': 582,\n     'end': 597,\n     'ner_label': 'Chemotherapy',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9997'},\n    {'chunk_id': '98f81754',\n     'chunk': '500 mg/m2',\n     'begin': 600,\n     'end': 608,\n     'ner_label': 'Dosage',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.92939997'},\n    {'chunk_id': 'bb801681',\n     'chunk': '6 cycles',\n     'begin': 630,\n     'end': 637,\n     'ner_label': 'Cycle_Count',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.54385'},\n    {'chunk_id': 'a48ae8cc',\n     'chunk': 'poor response',\n     'begin': 644,\n     'end': 656,\n     'ner_label': 'Response_To_Treatment',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.88715'}],\n   'assertions': [{'chunk_id': '1b71b12a',\n     'chunk': 'computed tomography',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'ce9ac1a9',\n     'chunk': 'CT',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'd3e628e9',\n     'chunk': 'mass',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '3d8b6be0',\n     'chunk': 'Pap smear',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '8de23a92',\n     'chunk': 'atypical glandular cells',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '70affced',\n     'chunk': 'adenocarcinoma',\n     'assertion': 'Possible',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '71dddb8a',\n     'chunk': 'pathologic specimen',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '63e46bca',\n     'chunk': 'extension',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'ac5748d2',\n     'chunk': 'tumor',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '5d80c8a0',\n     'chunk': 'enlarged',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '4d1b51ea',\n     'chunk': 'tumor',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '863911ea',\n     'chunk': 'papillary serous ovarian adenocarcinoma',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '4d3991f5',\n     'chunk': 'lung metastases',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'c2e02074',\n     'chunk': 'Neoadjuvant chemotherapy',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'd5d30ff5',\n     'chunk': 'Cyclophosphamide',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'bb801681',\n     'chunk': '6 cycles',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'}],\n   'resolutions': [],\n   'relations': [{'relation': 'O',\n     'chunk1_id': '3576c965',\n     'chunk1': 'abdomen',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.9439166',\n     'direction': 'both'},\n    {'relation': 'O',\n     'chunk1_id': 'cff2288c',\n     'chunk1': 'pelvis',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.9611397',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': '98848a68',\n     'chunk1': 'ovarian',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.922661',\n     'direction': 'both'},\n    {'relation': 'is_finding_of',\n     'chunk1_id': '3d8b6be0',\n     'chunk1': 'Pap smear',\n     'chunk2_id': '70affced',\n     'chunk2': 'adenocarcinoma',\n     'confidence': '0.52542114',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': '74e8e40b',\n     'chunk2': 'fallopian tubes',\n     'confidence': '0.9026299',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': '76146911',\n     'chunk2': 'appendix',\n     'confidence': '0.6649267',\n     'direction': 'both'},\n    {'relation': 'O',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': 'dc74e652',\n     'chunk2': 'omentum',\n     'confidence': '0.80328876',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Dosage',\n     'chunk1_id': 'c2e02074',\n     'chunk1': 'Neoadjuvant chemotherapy',\n     'chunk2_id': '98f81754',\n     'chunk2': '500 mg/m2',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Cycle_Count',\n     'chunk1_id': 'c2e02074',\n     'chunk1': 'Neoadjuvant chemotherapy',\n     'chunk2_id': 'bb801681',\n     'chunk2': '6 cycles',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Dosage',\n     'chunk1_id': 'd5d30ff5',\n     'chunk1': 'Cyclophosphamide',\n     'chunk2_id': '98f81754',\n     'chunk2': '500 mg/m2',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Cycle_Count',\n     'chunk1_id': 'd5d30ff5',\n     'chunk1': 'Cyclophosphamide',\n     'chunk2_id': 'bb801681',\n     'chunk2': '6 cycles',\n     'confidence': '1.0',\n     'direction': 'both'}],\n   'summaries': [],\n   'deidentifications': [],\n   'classifications': []}]}"
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:18.941938500Z",
     "start_time": "2024-07-13T04:52:18.924647500Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [
    {
     "data": {
      "text/plain": "    chunk_id                                    chunk  begin  end  \\\n0   1b71b12a                      computed tomography     24   42   \n1   ce9ac1a9                                       CT     45   46   \n2   3576c965                                  abdomen     61   67   \n3   cff2288c                                   pelvis     73   78   \n4   98848a68                                  ovarian    104  110   \n5   d3e628e9                                     mass    112  115   \n6   3d8b6be0                                Pap smear    120  128   \n7   4d03018b                          one month later    140  154   \n8   8de23a92                 atypical glandular cells    173  196   \n9   70affced                           adenocarcinoma    213  226   \n10  71dddb8a                      pathologic specimen    233  251   \n11  63e46bca                                extension    260  268   \n12  ac5748d2                                    tumor    277  281   \n13  74e8e40b                          fallopian tubes    298  312   \n14  76146911                                 appendix    315  322   \n15  dc74e652                                  omentum    325  331   \n16  5d80c8a0                                 enlarged    349  356   \n17  2bd5973e                              lymph nodes    358  368   \n18  4d1b51ea                                    tumor    409  413   \n19  9b3f72d4                               stage IIIC    419  428   \n20  863911ea  papillary serous ovarian adenocarcinoma    430  468   \n21  d47b22b6                         Two months later    471  486   \n22  4d3991f5                          lung metastases    520  534   \n23  c2e02074                 Neoadjuvant chemotherapy    536  559   \n24  d5d30ff5                         Cyclophosphamide    582  597   \n25  98f81754                                500 mg/m2    600  608   \n26  bb801681                                 6 cycles    630  637   \n27  a48ae8cc                            poor response    644  656   \n\n                ner_label              ner_source ner_confidence  \n0            Imaging_Test      ner_oncology_chunk         0.9575  \n1            Imaging_Test      ner_oncology_chunk         0.9565  \n2    Site_Other_Body_Part      ner_oncology_chunk         0.9446  \n3    Site_Other_Body_Part      ner_oncology_chunk         0.6514  \n4    Site_Other_Body_Part      ner_oncology_chunk         0.7915  \n5           Tumor_Finding      ner_oncology_chunk         0.9557  \n6          Pathology_Test      ner_oncology_chunk        0.96725  \n7           Relative_Date      ner_oncology_chunk      0.8786667  \n8        Pathology_Result      ner_oncology_chunk      0.7270667  \n9               Cancer_Dx      ner_oncology_chunk         0.9992  \n10         Pathology_Test      ner_oncology_chunk        0.76105  \n11               Invasion      ner_oncology_chunk         0.9241  \n12          Tumor_Finding      ner_oncology_chunk         0.9077  \n13   Site_Other_Body_Part      ner_oncology_chunk         0.9492  \n14   Site_Other_Body_Part      ner_oncology_chunk         0.9971  \n15   Site_Other_Body_Part      ner_oncology_chunk         0.9937  \n16    Lymph_Node_Modifier  ner_oncology_tnm_chunk         0.9882  \n17        Site_Lymph_Node      ner_oncology_chunk        0.80105  \n18          Tumor_Finding      ner_oncology_chunk         0.9851  \n19                Staging      ner_oncology_chunk        0.99035  \n20            Oncological           ner_jsl_chunk        0.60825  \n21          Relative_Date      ner_oncology_chunk      0.9206333  \n22            Oncological           ner_jsl_chunk     0.96220005  \n23           Chemotherapy      ner_oncology_chunk         0.9786  \n24           Chemotherapy      ner_oncology_chunk         0.9997  \n25                 Dosage      ner_oncology_chunk     0.92939997  \n26            Cycle_Count      ner_oncology_chunk        0.54385  \n27  Response_To_Treatment      ner_oncology_chunk        0.88715  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>chunk_id</th>\n      <th>chunk</th>\n      <th>begin</th>\n      <th>end</th>\n      <th>ner_label</th>\n      <th>ner_source</th>\n      <th>ner_confidence</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1b71b12a</td>\n      <td>computed tomography</td>\n      <td>24</td>\n      <td>42</td>\n      <td>Imaging_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9575</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>ce9ac1a9</td>\n      <td>CT</td>\n      <td>45</td>\n      <td>46</td>\n      <td>Imaging_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9565</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3576c965</td>\n      <td>abdomen</td>\n      <td>61</td>\n      <td>67</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9446</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>cff2288c</td>\n      <td>pelvis</td>\n      <td>73</td>\n      <td>78</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.6514</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>98848a68</td>\n      <td>ovarian</td>\n      <td>104</td>\n      <td>110</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.7915</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>112</td>\n      <td>115</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9557</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>3d8b6be0</td>\n      <td>Pap smear</td>\n      <td>120</td>\n      <td>128</td>\n      <td>Pathology_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.96725</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>4d03018b</td>\n      <td>one month later</td>\n      <td>140</td>\n      <td>154</td>\n      <td>Relative_Date</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.8786667</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>8de23a92</td>\n      <td>atypical glandular cells</td>\n      <td>173</td>\n      <td>196</td>\n      <td>Pathology_Result</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.7270667</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>70affced</td>\n      <td>adenocarcinoma</td>\n      <td>213</td>\n      <td>226</td>\n      <td>Cancer_Dx</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9992</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>71dddb8a</td>\n      <td>pathologic specimen</td>\n      <td>233</td>\n      <td>251</td>\n      <td>Pathology_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.76105</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>63e46bca</td>\n      <td>extension</td>\n      <td>260</td>\n      <td>268</td>\n      <td>Invasion</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9241</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>277</td>\n      <td>281</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9077</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>74e8e40b</td>\n      <td>fallopian tubes</td>\n      <td>298</td>\n      <td>312</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9492</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>76146911</td>\n      <td>appendix</td>\n      <td>315</td>\n      <td>322</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9971</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>dc74e652</td>\n      <td>omentum</td>\n      <td>325</td>\n      <td>331</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9937</td>\n    </tr>\n    <tr>\n      <th>16</th>\n      <td>5d80c8a0</td>\n      <td>enlarged</td>\n      <td>349</td>\n      <td>356</td>\n      <td>Lymph_Node_Modifier</td>\n      <td>ner_oncology_tnm_chunk</td>\n      <td>0.9882</td>\n    </tr>\n    <tr>\n      <th>17</th>\n      <td>2bd5973e</td>\n      <td>lymph nodes</td>\n      <td>358</td>\n      <td>368</td>\n      <td>Site_Lymph_Node</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.80105</td>\n    </tr>\n    <tr>\n      <th>18</th>\n      <td>4d1b51ea</td>\n      <td>tumor</td>\n      <td>409</td>\n      <td>413</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9851</td>\n    </tr>\n    <tr>\n      <th>19</th>\n      <td>9b3f72d4</td>\n      <td>stage IIIC</td>\n      <td>419</td>\n      <td>428</td>\n      <td>Staging</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.99035</td>\n    </tr>\n    <tr>\n      <th>20</th>\n      <td>863911ea</td>\n      <td>papillary serous ovarian adenocarcinoma</td>\n      <td>430</td>\n      <td>468</td>\n      <td>Oncological</td>\n      <td>ner_jsl_chunk</td>\n      <td>0.60825</td>\n    </tr>\n    <tr>\n      <th>21</th>\n      <td>d47b22b6</td>\n      <td>Two months later</td>\n      <td>471</td>\n      <td>486</td>\n      <td>Relative_Date</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9206333</td>\n    </tr>\n    <tr>\n      <th>22</th>\n      <td>4d3991f5</td>\n      <td>lung metastases</td>\n      <td>520</td>\n      <td>534</td>\n      <td>Oncological</td>\n      <td>ner_jsl_chunk</td>\n      <td>0.96220005</td>\n    </tr>\n    <tr>\n      <th>23</th>\n      <td>c2e02074</td>\n      <td>Neoadjuvant chemotherapy</td>\n      <td>536</td>\n      <td>559</td>\n      <td>Chemotherapy</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9786</td>\n    </tr>\n    <tr>\n      <th>24</th>\n      <td>d5d30ff5</td>\n      <td>Cyclophosphamide</td>\n      <td>582</td>\n      <td>597</td>\n      <td>Chemotherapy</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9997</td>\n    </tr>\n    <tr>\n      <th>25</th>\n      <td>98f81754</td>\n      <td>500 mg/m2</td>\n      <td>600</td>\n      <td>608</td>\n      <td>Dosage</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.92939997</td>\n    </tr>\n    <tr>\n      <th>26</th>\n      <td>bb801681</td>\n      <td>6 cycles</td>\n      <td>630</td>\n      <td>637</td>\n      <td>Cycle_Count</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.54385</td>\n    </tr>\n    <tr>\n      <th>27</th>\n      <td>a48ae8cc</td>\n      <td>poor response</td>\n      <td>644</td>\n      <td>656</td>\n      <td>Response_To_Treatment</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.88715</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.json_normalize(res['result'][0][\"entities\"])"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:20.078264700Z",
     "start_time": "2024-07-13T04:52:20.053415800Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "#### without custom column_maps (createParserDictionary)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warning::Spark Session already created, some configs may not take.\n",
      "Warning::Spark Session already created, some configs may not take.\n",
      "explain_clinical_doc_oncology download started this may take some time.\n",
      "Approx size to download 1.8 GB\n",
      "[OK!]\n",
      "Warning::Spark Session already created, some configs may not take.\n"
     ]
    }
   ],
   "source": [
    "res = nlu.load(\"en.explain_doc.clinical_oncology.pipeline\").predict(\n",
    "            \"The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response\",\n",
    "            parser_output=True\n",
    "        )"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.732101100Z",
     "start_time": "2024-07-13T04:52:21.858672100Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "outputs": [
    {
     "data": {
      "text/plain": "{'result': [{'document_identifier': 'Document',\n   'document_text': ['The Patient underwent a computed tomography (CT) scan of the abdomen and pelvis, which showed a complex ovarian mass. A Pap smear performed one month later was positive for atypical glandular cells suspicious for adenocarcinoma. The pathologic specimen showed extension of the tumor throughout the fallopian tubes, appendix, omentum, and 5 out of 5 enlarged lymph nodes. The final pathologic diagnosis of the tumor was stage IIIC papillary serous ovarian adenocarcinoma. Two months later, the patient was diagnosed with lung metastases.Neoadjuvant chemotherapy with the regimens of Cyclophosphamide (500 mg/m2) is being given for 6 cycles with poor response'],\n   'entities': [{'chunk_id': '1b71b12a',\n     'chunk': 'computed tomography',\n     'begin': 24,\n     'end': 42,\n     'ner_label': 'Imaging_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9575'},\n    {'chunk_id': 'ce9ac1a9',\n     'chunk': 'CT',\n     'begin': 45,\n     'end': 46,\n     'ner_label': 'Imaging_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9565'},\n    {'chunk_id': '3576c965',\n     'chunk': 'abdomen',\n     'begin': 61,\n     'end': 67,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9446'},\n    {'chunk_id': 'cff2288c',\n     'chunk': 'pelvis',\n     'begin': 73,\n     'end': 78,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.6514'},\n    {'chunk_id': '98848a68',\n     'chunk': 'ovarian',\n     'begin': 104,\n     'end': 110,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.7915'},\n    {'chunk_id': 'd3e628e9',\n     'chunk': 'mass',\n     'begin': 112,\n     'end': 115,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9557'},\n    {'chunk_id': '3d8b6be0',\n     'chunk': 'Pap smear',\n     'begin': 120,\n     'end': 128,\n     'ner_label': 'Pathology_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.96725'},\n    {'chunk_id': '4d03018b',\n     'chunk': 'one month later',\n     'begin': 140,\n     'end': 154,\n     'ner_label': 'Relative_Date',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.8786667'},\n    {'chunk_id': '8de23a92',\n     'chunk': 'atypical glandular cells',\n     'begin': 173,\n     'end': 196,\n     'ner_label': 'Pathology_Result',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.7270667'},\n    {'chunk_id': '70affced',\n     'chunk': 'adenocarcinoma',\n     'begin': 213,\n     'end': 226,\n     'ner_label': 'Cancer_Dx',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9992'},\n    {'chunk_id': '71dddb8a',\n     'chunk': 'pathologic specimen',\n     'begin': 233,\n     'end': 251,\n     'ner_label': 'Pathology_Test',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.76105'},\n    {'chunk_id': '63e46bca',\n     'chunk': 'extension',\n     'begin': 260,\n     'end': 268,\n     'ner_label': 'Invasion',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9241'},\n    {'chunk_id': 'ac5748d2',\n     'chunk': 'tumor',\n     'begin': 277,\n     'end': 281,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9077'},\n    {'chunk_id': '74e8e40b',\n     'chunk': 'fallopian tubes',\n     'begin': 298,\n     'end': 312,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9492'},\n    {'chunk_id': '76146911',\n     'chunk': 'appendix',\n     'begin': 315,\n     'end': 322,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9971'},\n    {'chunk_id': 'dc74e652',\n     'chunk': 'omentum',\n     'begin': 325,\n     'end': 331,\n     'ner_label': 'Site_Other_Body_Part',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9937'},\n    {'chunk_id': '5d80c8a0',\n     'chunk': 'enlarged',\n     'begin': 349,\n     'end': 356,\n     'ner_label': 'Lymph_Node_Modifier',\n     'ner_source': 'ner_oncology_tnm_chunk',\n     'ner_confidence': '0.9882'},\n    {'chunk_id': '2bd5973e',\n     'chunk': 'lymph nodes',\n     'begin': 358,\n     'end': 368,\n     'ner_label': 'Site_Lymph_Node',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.80105'},\n    {'chunk_id': '4d1b51ea',\n     'chunk': 'tumor',\n     'begin': 409,\n     'end': 413,\n     'ner_label': 'Tumor_Finding',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9851'},\n    {'chunk_id': '9b3f72d4',\n     'chunk': 'stage IIIC',\n     'begin': 419,\n     'end': 428,\n     'ner_label': 'Staging',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.99035'},\n    {'chunk_id': '863911ea',\n     'chunk': 'papillary serous ovarian adenocarcinoma',\n     'begin': 430,\n     'end': 468,\n     'ner_label': 'Oncological',\n     'ner_source': 'ner_jsl_chunk',\n     'ner_confidence': '0.60825'},\n    {'chunk_id': 'd47b22b6',\n     'chunk': 'Two months later',\n     'begin': 471,\n     'end': 486,\n     'ner_label': 'Relative_Date',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9206333'},\n    {'chunk_id': '4d3991f5',\n     'chunk': 'lung metastases',\n     'begin': 520,\n     'end': 534,\n     'ner_label': 'Oncological',\n     'ner_source': 'ner_jsl_chunk',\n     'ner_confidence': '0.96220005'},\n    {'chunk_id': 'c2e02074',\n     'chunk': 'Neoadjuvant chemotherapy',\n     'begin': 536,\n     'end': 559,\n     'ner_label': 'Chemotherapy',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9786'},\n    {'chunk_id': 'd5d30ff5',\n     'chunk': 'Cyclophosphamide',\n     'begin': 582,\n     'end': 597,\n     'ner_label': 'Chemotherapy',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.9997'},\n    {'chunk_id': '98f81754',\n     'chunk': '500 mg/m2',\n     'begin': 600,\n     'end': 608,\n     'ner_label': 'Dosage',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.92939997'},\n    {'chunk_id': 'bb801681',\n     'chunk': '6 cycles',\n     'begin': 630,\n     'end': 637,\n     'ner_label': 'Cycle_Count',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.54385'},\n    {'chunk_id': 'a48ae8cc',\n     'chunk': 'poor response',\n     'begin': 644,\n     'end': 656,\n     'ner_label': 'Response_To_Treatment',\n     'ner_source': 'ner_oncology_chunk',\n     'ner_confidence': '0.88715'}],\n   'assertions': [{'chunk_id': '1b71b12a',\n     'chunk': 'computed tomography',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'ce9ac1a9',\n     'chunk': 'CT',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'd3e628e9',\n     'chunk': 'mass',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '3d8b6be0',\n     'chunk': 'Pap smear',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '8de23a92',\n     'chunk': 'atypical glandular cells',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '70affced',\n     'chunk': 'adenocarcinoma',\n     'assertion': 'Possible',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '71dddb8a',\n     'chunk': 'pathologic specimen',\n     'assertion': 'Past',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '63e46bca',\n     'chunk': 'extension',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'ac5748d2',\n     'chunk': 'tumor',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '5d80c8a0',\n     'chunk': 'enlarged',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '4d1b51ea',\n     'chunk': 'tumor',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '863911ea',\n     'chunk': 'papillary serous ovarian adenocarcinoma',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': '4d3991f5',\n     'chunk': 'lung metastases',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'c2e02074',\n     'chunk': 'Neoadjuvant chemotherapy',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'd5d30ff5',\n     'chunk': 'Cyclophosphamide',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'},\n    {'chunk_id': 'bb801681',\n     'chunk': '6 cycles',\n     'assertion': 'Present',\n     'assertion_source': 'assertion'}],\n   'resolutions': [],\n   'relations': [{'relation': 'O',\n     'chunk1_id': '3576c965',\n     'chunk1': 'abdomen',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.9439166',\n     'direction': 'both'},\n    {'relation': 'O',\n     'chunk1_id': 'cff2288c',\n     'chunk1': 'pelvis',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.9611397',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': '98848a68',\n     'chunk1': 'ovarian',\n     'chunk2_id': 'd3e628e9',\n     'chunk2': 'mass',\n     'confidence': '0.922661',\n     'direction': 'both'},\n    {'relation': 'is_finding_of',\n     'chunk1_id': '3d8b6be0',\n     'chunk1': 'Pap smear',\n     'chunk2_id': '70affced',\n     'chunk2': 'adenocarcinoma',\n     'confidence': '0.52542114',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': '74e8e40b',\n     'chunk2': 'fallopian tubes',\n     'confidence': '0.9026299',\n     'direction': 'both'},\n    {'relation': 'is_location_of',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': '76146911',\n     'chunk2': 'appendix',\n     'confidence': '0.6649267',\n     'direction': 'both'},\n    {'relation': 'O',\n     'chunk1_id': 'ac5748d2',\n     'chunk1': 'tumor',\n     'chunk2_id': 'dc74e652',\n     'chunk2': 'omentum',\n     'confidence': '0.80328876',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Dosage',\n     'chunk1_id': 'c2e02074',\n     'chunk1': 'Neoadjuvant chemotherapy',\n     'chunk2_id': '98f81754',\n     'chunk2': '500 mg/m2',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Cycle_Count',\n     'chunk1_id': 'c2e02074',\n     'chunk1': 'Neoadjuvant chemotherapy',\n     'chunk2_id': 'bb801681',\n     'chunk2': '6 cycles',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Dosage',\n     'chunk1_id': 'd5d30ff5',\n     'chunk1': 'Cyclophosphamide',\n     'chunk2_id': '98f81754',\n     'chunk2': '500 mg/m2',\n     'confidence': '1.0',\n     'direction': 'both'},\n    {'relation': 'Chemotherapy-Cycle_Count',\n     'chunk1_id': 'd5d30ff5',\n     'chunk1': 'Cyclophosphamide',\n     'chunk2_id': 'bb801681',\n     'chunk2': '6 cycles',\n     'confidence': '1.0',\n     'direction': 'both'}],\n   'summaries': [],\n   'deidentifications': [],\n   'classifications': []}]}"
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "res"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.748781500Z",
     "start_time": "2024-07-13T04:52:29.734101800Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "outputs": [
    {
     "data": {
      "text/plain": "  document_identifier                                      document_text  \\\n0            Document  [The Patient underwent a computed tomography (...   \n\n                                            entities  \\\n0  [{'chunk_id': '1b71b12a', 'chunk': 'computed t...   \n\n                                          assertions resolutions  \\\n0  [{'chunk_id': '1b71b12a', 'chunk': 'computed t...          []   \n\n                                           relations summaries  \\\n0  [{'relation': 'O', 'chunk1_id': '3576c965', 'c...        []   \n\n  deidentifications classifications  \n0                []              []  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>document_identifier</th>\n      <th>document_text</th>\n      <th>entities</th>\n      <th>assertions</th>\n      <th>resolutions</th>\n      <th>relations</th>\n      <th>summaries</th>\n      <th>deidentifications</th>\n      <th>classifications</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>Document</td>\n      <td>[The Patient underwent a computed tomography (...</td>\n      <td>[{'chunk_id': '1b71b12a', 'chunk': 'computed t...</td>\n      <td>[{'chunk_id': '1b71b12a', 'chunk': 'computed t...</td>\n      <td>[]</td>\n      <td>[{'relation': 'O', 'chunk1_id': '3576c965', 'c...</td>\n      <td>[]</td>\n      <td>[]</td>\n      <td>[]</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.json_normalize(res['result'][0])"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.799782200Z",
     "start_time": "2024-07-13T04:52:29.750784900Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "outputs": [
    {
     "data": {
      "text/plain": "    chunk_id                                    chunk  begin  end  \\\n0   1b71b12a                      computed tomography     24   42   \n1   ce9ac1a9                                       CT     45   46   \n2   3576c965                                  abdomen     61   67   \n3   cff2288c                                   pelvis     73   78   \n4   98848a68                                  ovarian    104  110   \n5   d3e628e9                                     mass    112  115   \n6   3d8b6be0                                Pap smear    120  128   \n7   4d03018b                          one month later    140  154   \n8   8de23a92                 atypical glandular cells    173  196   \n9   70affced                           adenocarcinoma    213  226   \n10  71dddb8a                      pathologic specimen    233  251   \n11  63e46bca                                extension    260  268   \n12  ac5748d2                                    tumor    277  281   \n13  74e8e40b                          fallopian tubes    298  312   \n14  76146911                                 appendix    315  322   \n15  dc74e652                                  omentum    325  331   \n16  5d80c8a0                                 enlarged    349  356   \n17  2bd5973e                              lymph nodes    358  368   \n18  4d1b51ea                                    tumor    409  413   \n19  9b3f72d4                               stage IIIC    419  428   \n20  863911ea  papillary serous ovarian adenocarcinoma    430  468   \n21  d47b22b6                         Two months later    471  486   \n22  4d3991f5                          lung metastases    520  534   \n23  c2e02074                 Neoadjuvant chemotherapy    536  559   \n24  d5d30ff5                         Cyclophosphamide    582  597   \n25  98f81754                                500 mg/m2    600  608   \n26  bb801681                                 6 cycles    630  637   \n27  a48ae8cc                            poor response    644  656   \n\n                ner_label              ner_source ner_confidence  \n0            Imaging_Test      ner_oncology_chunk         0.9575  \n1            Imaging_Test      ner_oncology_chunk         0.9565  \n2    Site_Other_Body_Part      ner_oncology_chunk         0.9446  \n3    Site_Other_Body_Part      ner_oncology_chunk         0.6514  \n4    Site_Other_Body_Part      ner_oncology_chunk         0.7915  \n5           Tumor_Finding      ner_oncology_chunk         0.9557  \n6          Pathology_Test      ner_oncology_chunk        0.96725  \n7           Relative_Date      ner_oncology_chunk      0.8786667  \n8        Pathology_Result      ner_oncology_chunk      0.7270667  \n9               Cancer_Dx      ner_oncology_chunk         0.9992  \n10         Pathology_Test      ner_oncology_chunk        0.76105  \n11               Invasion      ner_oncology_chunk         0.9241  \n12          Tumor_Finding      ner_oncology_chunk         0.9077  \n13   Site_Other_Body_Part      ner_oncology_chunk         0.9492  \n14   Site_Other_Body_Part      ner_oncology_chunk         0.9971  \n15   Site_Other_Body_Part      ner_oncology_chunk         0.9937  \n16    Lymph_Node_Modifier  ner_oncology_tnm_chunk         0.9882  \n17        Site_Lymph_Node      ner_oncology_chunk        0.80105  \n18          Tumor_Finding      ner_oncology_chunk         0.9851  \n19                Staging      ner_oncology_chunk        0.99035  \n20            Oncological           ner_jsl_chunk        0.60825  \n21          Relative_Date      ner_oncology_chunk      0.9206333  \n22            Oncological           ner_jsl_chunk     0.96220005  \n23           Chemotherapy      ner_oncology_chunk         0.9786  \n24           Chemotherapy      ner_oncology_chunk         0.9997  \n25                 Dosage      ner_oncology_chunk     0.92939997  \n26            Cycle_Count      ner_oncology_chunk        0.54385  \n27  Response_To_Treatment      ner_oncology_chunk        0.88715  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>chunk_id</th>\n      <th>chunk</th>\n      <th>begin</th>\n      <th>end</th>\n      <th>ner_label</th>\n      <th>ner_source</th>\n      <th>ner_confidence</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1b71b12a</td>\n      <td>computed tomography</td>\n      <td>24</td>\n      <td>42</td>\n      <td>Imaging_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9575</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>ce9ac1a9</td>\n      <td>CT</td>\n      <td>45</td>\n      <td>46</td>\n      <td>Imaging_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9565</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>3576c965</td>\n      <td>abdomen</td>\n      <td>61</td>\n      <td>67</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9446</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>cff2288c</td>\n      <td>pelvis</td>\n      <td>73</td>\n      <td>78</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.6514</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>98848a68</td>\n      <td>ovarian</td>\n      <td>104</td>\n      <td>110</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.7915</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>112</td>\n      <td>115</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9557</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>3d8b6be0</td>\n      <td>Pap smear</td>\n      <td>120</td>\n      <td>128</td>\n      <td>Pathology_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.96725</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>4d03018b</td>\n      <td>one month later</td>\n      <td>140</td>\n      <td>154</td>\n      <td>Relative_Date</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.8786667</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>8de23a92</td>\n      <td>atypical glandular cells</td>\n      <td>173</td>\n      <td>196</td>\n      <td>Pathology_Result</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.7270667</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>70affced</td>\n      <td>adenocarcinoma</td>\n      <td>213</td>\n      <td>226</td>\n      <td>Cancer_Dx</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9992</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>71dddb8a</td>\n      <td>pathologic specimen</td>\n      <td>233</td>\n      <td>251</td>\n      <td>Pathology_Test</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.76105</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>63e46bca</td>\n      <td>extension</td>\n      <td>260</td>\n      <td>268</td>\n      <td>Invasion</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9241</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>277</td>\n      <td>281</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9077</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>74e8e40b</td>\n      <td>fallopian tubes</td>\n      <td>298</td>\n      <td>312</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9492</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>76146911</td>\n      <td>appendix</td>\n      <td>315</td>\n      <td>322</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9971</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>dc74e652</td>\n      <td>omentum</td>\n      <td>325</td>\n      <td>331</td>\n      <td>Site_Other_Body_Part</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9937</td>\n    </tr>\n    <tr>\n      <th>16</th>\n      <td>5d80c8a0</td>\n      <td>enlarged</td>\n      <td>349</td>\n      <td>356</td>\n      <td>Lymph_Node_Modifier</td>\n      <td>ner_oncology_tnm_chunk</td>\n      <td>0.9882</td>\n    </tr>\n    <tr>\n      <th>17</th>\n      <td>2bd5973e</td>\n      <td>lymph nodes</td>\n      <td>358</td>\n      <td>368</td>\n      <td>Site_Lymph_Node</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.80105</td>\n    </tr>\n    <tr>\n      <th>18</th>\n      <td>4d1b51ea</td>\n      <td>tumor</td>\n      <td>409</td>\n      <td>413</td>\n      <td>Tumor_Finding</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9851</td>\n    </tr>\n    <tr>\n      <th>19</th>\n      <td>9b3f72d4</td>\n      <td>stage IIIC</td>\n      <td>419</td>\n      <td>428</td>\n      <td>Staging</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.99035</td>\n    </tr>\n    <tr>\n      <th>20</th>\n      <td>863911ea</td>\n      <td>papillary serous ovarian adenocarcinoma</td>\n      <td>430</td>\n      <td>468</td>\n      <td>Oncological</td>\n      <td>ner_jsl_chunk</td>\n      <td>0.60825</td>\n    </tr>\n    <tr>\n      <th>21</th>\n      <td>d47b22b6</td>\n      <td>Two months later</td>\n      <td>471</td>\n      <td>486</td>\n      <td>Relative_Date</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9206333</td>\n    </tr>\n    <tr>\n      <th>22</th>\n      <td>4d3991f5</td>\n      <td>lung metastases</td>\n      <td>520</td>\n      <td>534</td>\n      <td>Oncological</td>\n      <td>ner_jsl_chunk</td>\n      <td>0.96220005</td>\n    </tr>\n    <tr>\n      <th>23</th>\n      <td>c2e02074</td>\n      <td>Neoadjuvant chemotherapy</td>\n      <td>536</td>\n      <td>559</td>\n      <td>Chemotherapy</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9786</td>\n    </tr>\n    <tr>\n      <th>24</th>\n      <td>d5d30ff5</td>\n      <td>Cyclophosphamide</td>\n      <td>582</td>\n      <td>597</td>\n      <td>Chemotherapy</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.9997</td>\n    </tr>\n    <tr>\n      <th>25</th>\n      <td>98f81754</td>\n      <td>500 mg/m2</td>\n      <td>600</td>\n      <td>608</td>\n      <td>Dosage</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.92939997</td>\n    </tr>\n    <tr>\n      <th>26</th>\n      <td>bb801681</td>\n      <td>6 cycles</td>\n      <td>630</td>\n      <td>637</td>\n      <td>Cycle_Count</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.54385</td>\n    </tr>\n    <tr>\n      <th>27</th>\n      <td>a48ae8cc</td>\n      <td>poor response</td>\n      <td>644</td>\n      <td>656</td>\n      <td>Response_To_Treatment</td>\n      <td>ner_oncology_chunk</td>\n      <td>0.88715</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.json_normalize(res['result'][0][\"entities\"])"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.800781700Z",
     "start_time": "2024-07-13T04:52:29.781787200Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "outputs": [
    {
     "data": {
      "text/plain": "    chunk_id                                    chunk assertion  \\\n0   1b71b12a                      computed tomography      Past   \n1   ce9ac1a9                                       CT      Past   \n2   d3e628e9                                     mass   Present   \n3   3d8b6be0                                Pap smear      Past   \n4   8de23a92                 atypical glandular cells   Present   \n5   70affced                           adenocarcinoma  Possible   \n6   71dddb8a                      pathologic specimen      Past   \n7   63e46bca                                extension   Present   \n8   ac5748d2                                    tumor   Present   \n9   5d80c8a0                                 enlarged   Present   \n10  4d1b51ea                                    tumor   Present   \n11  863911ea  papillary serous ovarian adenocarcinoma   Present   \n12  4d3991f5                          lung metastases   Present   \n13  c2e02074                 Neoadjuvant chemotherapy   Present   \n14  d5d30ff5                         Cyclophosphamide   Present   \n15  bb801681                                 6 cycles   Present   \n\n   assertion_source  \n0         assertion  \n1         assertion  \n2         assertion  \n3         assertion  \n4         assertion  \n5         assertion  \n6         assertion  \n7         assertion  \n8         assertion  \n9         assertion  \n10        assertion  \n11        assertion  \n12        assertion  \n13        assertion  \n14        assertion  \n15        assertion  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>chunk_id</th>\n      <th>chunk</th>\n      <th>assertion</th>\n      <th>assertion_source</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>1b71b12a</td>\n      <td>computed tomography</td>\n      <td>Past</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>ce9ac1a9</td>\n      <td>CT</td>\n      <td>Past</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>3d8b6be0</td>\n      <td>Pap smear</td>\n      <td>Past</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>8de23a92</td>\n      <td>atypical glandular cells</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>70affced</td>\n      <td>adenocarcinoma</td>\n      <td>Possible</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>71dddb8a</td>\n      <td>pathologic specimen</td>\n      <td>Past</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>63e46bca</td>\n      <td>extension</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>5d80c8a0</td>\n      <td>enlarged</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>4d1b51ea</td>\n      <td>tumor</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>11</th>\n      <td>863911ea</td>\n      <td>papillary serous ovarian adenocarcinoma</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>12</th>\n      <td>4d3991f5</td>\n      <td>lung metastases</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>13</th>\n      <td>c2e02074</td>\n      <td>Neoadjuvant chemotherapy</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>14</th>\n      <td>d5d30ff5</td>\n      <td>Cyclophosphamide</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n    <tr>\n      <th>15</th>\n      <td>bb801681</td>\n      <td>6 cycles</td>\n      <td>Present</td>\n      <td>assertion</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.json_normalize(res['result'][0][\"assertions\"])"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.809791500Z",
     "start_time": "2024-07-13T04:52:29.803786Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "outputs": [
    {
     "data": {
      "text/plain": "                    relation chunk1_id                    chunk1 chunk2_id  \\\n0                          O  3576c965                   abdomen  d3e628e9   \n1                          O  cff2288c                    pelvis  d3e628e9   \n2             is_location_of  98848a68                   ovarian  d3e628e9   \n3              is_finding_of  3d8b6be0                 Pap smear  70affced   \n4             is_location_of  ac5748d2                     tumor  74e8e40b   \n5             is_location_of  ac5748d2                     tumor  76146911   \n6                          O  ac5748d2                     tumor  dc74e652   \n7        Chemotherapy-Dosage  c2e02074  Neoadjuvant chemotherapy  98f81754   \n8   Chemotherapy-Cycle_Count  c2e02074  Neoadjuvant chemotherapy  bb801681   \n9        Chemotherapy-Dosage  d5d30ff5          Cyclophosphamide  98f81754   \n10  Chemotherapy-Cycle_Count  d5d30ff5          Cyclophosphamide  bb801681   \n\n             chunk2  confidence direction  \n0              mass   0.9439166      both  \n1              mass   0.9611397      both  \n2              mass    0.922661      both  \n3    adenocarcinoma  0.52542114      both  \n4   fallopian tubes   0.9026299      both  \n5          appendix   0.6649267      both  \n6           omentum  0.80328876      both  \n7         500 mg/m2         1.0      both  \n8          6 cycles         1.0      both  \n9         500 mg/m2         1.0      both  \n10         6 cycles         1.0      both  ",
      "text/html": "<div>\n<style scoped>\n    .dataframe tbody tr th:only-of-type {\n        vertical-align: middle;\n    }\n\n    .dataframe tbody tr th {\n        vertical-align: top;\n    }\n\n    .dataframe thead th {\n        text-align: right;\n    }\n</style>\n<table border=\"1\" class=\"dataframe\">\n  <thead>\n    <tr style=\"text-align: right;\">\n      <th></th>\n      <th>relation</th>\n      <th>chunk1_id</th>\n      <th>chunk1</th>\n      <th>chunk2_id</th>\n      <th>chunk2</th>\n      <th>confidence</th>\n      <th>direction</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td>O</td>\n      <td>3576c965</td>\n      <td>abdomen</td>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>0.9439166</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>O</td>\n      <td>cff2288c</td>\n      <td>pelvis</td>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>0.9611397</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>is_location_of</td>\n      <td>98848a68</td>\n      <td>ovarian</td>\n      <td>d3e628e9</td>\n      <td>mass</td>\n      <td>0.922661</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>3</th>\n      <td>is_finding_of</td>\n      <td>3d8b6be0</td>\n      <td>Pap smear</td>\n      <td>70affced</td>\n      <td>adenocarcinoma</td>\n      <td>0.52542114</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>4</th>\n      <td>is_location_of</td>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>74e8e40b</td>\n      <td>fallopian tubes</td>\n      <td>0.9026299</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>5</th>\n      <td>is_location_of</td>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>76146911</td>\n      <td>appendix</td>\n      <td>0.6649267</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>6</th>\n      <td>O</td>\n      <td>ac5748d2</td>\n      <td>tumor</td>\n      <td>dc74e652</td>\n      <td>omentum</td>\n      <td>0.80328876</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>7</th>\n      <td>Chemotherapy-Dosage</td>\n      <td>c2e02074</td>\n      <td>Neoadjuvant chemotherapy</td>\n      <td>98f81754</td>\n      <td>500 mg/m2</td>\n      <td>1.0</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>8</th>\n      <td>Chemotherapy-Cycle_Count</td>\n      <td>c2e02074</td>\n      <td>Neoadjuvant chemotherapy</td>\n      <td>bb801681</td>\n      <td>6 cycles</td>\n      <td>1.0</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>9</th>\n      <td>Chemotherapy-Dosage</td>\n      <td>d5d30ff5</td>\n      <td>Cyclophosphamide</td>\n      <td>98f81754</td>\n      <td>500 mg/m2</td>\n      <td>1.0</td>\n      <td>both</td>\n    </tr>\n    <tr>\n      <th>10</th>\n      <td>Chemotherapy-Cycle_Count</td>\n      <td>d5d30ff5</td>\n      <td>Cyclophosphamide</td>\n      <td>bb801681</td>\n      <td>6 cycles</td>\n      <td>1.0</td>\n      <td>both</td>\n    </tr>\n  </tbody>\n</table>\n</div>"
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.json_normalize(res['result'][0][\"relations\"])"
   ],
   "metadata": {
    "collapsed": false,
    "ExecuteTime": {
     "end_time": "2024-07-13T04:52:29.839295500Z",
     "start_time": "2024-07-13T04:52:29.812789700Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
